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ABSTRACT 

Estimating the population total in two-stage 
survey sampling is considered^ making use of a 
(superpopulation) model The problem is then 
really one of predictirg *r J unobserved part of 
the total, and the concept of predictive likeli- 
hood is studied. Prediction intervals and a pre- 
dictor for the population total are derived for 
the normal case, based on predictive likeliliood. 

1. INTRODUCTION 

Two-stage surveys are used in sampling from 
finite populations of, say, N primary units or 
clusters, where each primary unit consists of 
mi secondary units. iV is assumed known, but 
the nit-'' are imknown before sampling. Let yxj- 
be the value of the variable of interest for sec- 
ondary unit ; of i'th primary unit. The problem 
is to estimate the total 

m; 
1=1 i=i 

An example of this situation is considered in 
Thomsen and Tesfu (1988), with i being the size 
of a particular population. The primary units 
are certain administrative units, the secondary 
units are households and yy is the number of 
persons in household ; of the i'th administra- 
tive unit. 

We assume that, before sampluig, other mea- 
sures of the sizes of the primary units are avail- 
able to us. Let x\^,„^xij be these measures and 

letX = EL ^i- 

The sampling plan is as follows: At stage 

1 a sample s of size no of the primary units 
(l,...,iV) is selected according to some sam- 
pling design, and at stage 2 we select for each 



I € -5 a sample 5, of size n, of secondary units 
using possibly a different sampling design than 
at stage 1. The designs are assumed to be 
non-informative, i.e. they do not depend on the 
yy's and 7n,-'s. E.g., in Thomsen and Tesfu 
(1988) the two-stage sampling plan is to use 
pps-sampling at stage 1 (letting selection prob- 
abilities of primary units be proportional to the 
Xi's) and simple random sampling (srs) at stage 
2. 

The total sample size is n = ^Zf^j^i and 
our data now consists of y{s) = {yy : i G 5, 
J G 5f} and 771(5) = {771^ : i G 5}. Let 
y = (y(5), 771(5)). For the pps-srn sampling 
plan mentioned above a commonly used design- 
unbiased estimator off is the modified Horvitz- 
Thompscn estimator (see for example Cochran 
(1977), chapt, 11) 

£ht = ^E^ (1) 
no rr scx 

where y; = Xlje*.- Vijl'^- 

In this paper a (superpopulation) model is 
adopted, regarding m;,y;j- as realized veJues of 
rsmdom variables Mi,Yij for j = 1,...,M; and 
i = l,...,iV. M = (Mi,...,Mjv) is assumed 
independent of all Fij-, and further: 

E{MO = &Xi, r(Af.) = (T^a;.) , (2) 

Cov(Mi,Mj-) = 0, 

£;(1-,) = M, ^(^.j)=r^ and 

Cov(yiy,Vife) = pr' if ^i^j 
Cov(yi,-,r,fc) = o if li^i. 

Let Q = {P,(T,fi,T,p) with p > 0 and let v, = 
Zi^sv{xi), V, = Zri.v{xi). TypicaUy v{^) = 
x» with 0<9<2. 

Jxoyall (1976) considers a similzir model 
for Yij, assuming the m,'s axe known, while 



Royall (1986) also considers unknown m^^s with 
a model similar to (2). 

The total t is now a realized value of a ran- 
dom variable T, where T can be expressed as 
r = Z£e*Eie,,^ii + ^with 

Mi 

^ = EE^y + EE^i- (3) 

Expressing the total T on this form we see that 
the problem ^aa be described as one oi predict- 
ing the unobserved value z of the random vari- 
able Z. It is often clarifying to write a predictor 
f of T on the form 

where Z then implicitly is a predictor of Z. 
From this point of view Tht^ given by (1), 
does not look like a reasonable predicor. Royall 
(1970) seems to have been the first one to rea- 
lize the value in representing any predictor on 
the prediction form (4), see also Smith (1976). 

Modelling the population in survey sampling 
problems has been and still is controversial. An 
important aspect of this issue is that the like- 
lihood principle in a sense makes it necessary 
to model the population. Without a model 
the only stochastic elements are the samples 
s = {SySi : i t 5}, and the likelihood func- 
tion is then flat (see, e.g., Cassel et al., (1977)), 
whidi means that from the likelihood principle 
point of view the data contains no iufoimation 
obout the unobserved yy's and mt's. To make 
inference we th'^refore need to relate the data to 
the unobserved values somehow, and the most 
natural way of doing so is to formulate a model 
(see also remarks by Berger and Wolpert (1984, 
p. 114)). 

The random variables observed <ire Y{s), 
M{6) and s, where s now is ancillary. The like- 
lihood prmciple implies that inference should 
depend only on the actual s observed and n ^ 
on the sampling design. This is called the pre- 
dictiou approach to survey sampling and will 
be adopted in tliis paper. Hence everythhig is 
considered conditiocal on s. The prediction ap- 
proadi aims at choosing a predictor that is good 



for the actual s obtained and has given signi- 
ficant contributions to a better understanding 
of several problems in survey sampling, some 
of which are mentioned in Thomsen and Tesfu 
(1988). It also enables one to use more conven- 
tional statistical methods, although the prob- 
lem is not to make inference about 0 but rather 
predict Z. Hence 0 basically plays the role of a 
nuisance parameter. 

To predict Z we shall use the concept of pre- 
dictive likelihood, a non Bayesian likelihood ap 
proach to prediction problems in general. One 
can argue that in the context of a superpopula 
tion model survey samplhig provides one of the 
more natural prediction problems in statistics, 
and predictive likelihood could therefore serve 
as a basis for essentially all problems of tliis 
khid in survey samplhig. Some major references 
to the general theory of predictive likelihood 
are Hinkley U979), Matliiasen (1979) and But> 
ler (1986). A review of some of the suggested 
likelihoods is given in Bj0rnstad (1990). 

Section 2 introduces the concept of predic- 
tive likelihood and shows how predictors and 
prediction intervals can be constructed from a 
predictive likelihood. 

In Section 3 a predictive likelihood is derived 
for the normal model. The usual approaches 
to obtciin a predictive likelihood do not work 
in two-stage sampling, mednly because ^ is a 
sum of a stochoi^Uc number of random variables. 
Therefore a modiiii^acion is suggested. 

The predictor obtained from the predictive 
likelihood is given by: 

Zo = E^{Z\y) = J2{mi - n,) x 

f A+ yr] 

Here, yi = Y.^^,. yx^ln, and 6 = {ftyf,pj,&) is 
the MLE. With = Zies ^iK^i) 

P = {J2miXi/v{xi)}/w,. (5) 
Since P is the weighted least squares estimatur 



it is the best unbiased estimator of j0. Let tw, = 
(l-.p)/(l-^ + n;^). 

Wriung Zo = Zie. Ej^ui^tf^ + (1 * ^Oj/O + 
lliis0^i)i^ we see from (3) that predictmg Z 
by Zo means that for t 0 5 each unobserved 
Yij is predicted by ft and Aff is predicted by 
Pxi. For i 6 5, ; ^ .9^ Yij is predicted by 
Wifi + (1 - 

Tliree prediction intervals for Z based on 
similar predictive likelihoods are constr ucte d. 
They are all of the form Zo ± u{a/2)yJVp{Z) 
where u{a/2) is the upper (a/2)-point of 
N{Oy 1). Vp{Z) is a measure of the uncertainty 
in predictmg Z of the form Vp{Z) = V§{Z\y)+ 
(term for parameter uncertamty), see (18). 

With v^ = Zifj<^i)^ 

Ve{Z\y) = r^Y^{mi'ni)x (6) 

V 1 - p + ni/7 

-^-^-)+r2(/3X, + p(72t;, 
1 - P + riipj 

For large tiq, the three intervals are practically 
identical. However, for small no they differ sig- 
nificantly. To illustrate tliis confidence levels 
are estimated by simulation for 1 - a = .95, 
no = 6, = 10, i;(a:) = x and selected values 
of (xi,...,a:iv) and 6. 

In a subsequent paper a more comprehen- 
sive simulation study fur estimating confidence 
levels will be undertaken, as well as a consider- 
ation of optimality for model-unbiased predic- 
tors. 

2. PREDICTIVE LIKELIHOOD 

We shall here give a brief general intruduc 
tion to the concept of predictive likelihocd. 
For a more complete exposition we refer to 
Bj0mstad (1990). Let 7 = y be the data. The 
problem is to predict the unobserved or future 
value 2 of a random variable Z usually by a 
predictor and confidence interval for Z. It Is 
assumed that (Y,Z) has a probability density 
or mass function (pdf) fe{y,z). In geu^ial we 



let /^( ) and /^( 1) denote the pdf and condi- 
tional pdf of the enclosed variables. The joinc 
likelihood function for the two unknown quan- 
tities, z and 9y is given hy ly{z,0) = feiy^z). 
The aim is to develop a likelihood for 2, 1(2|y), 
by sliminating 9 from /y. Any such likelihood 
is c:illed a predictive likelihood. 

DifFerene ways of eliminating 0 then give rice 
to different L. The two main type of sugges- 
tions are the conditional predictive likelihood 
Ley essentially suggested by Hinkley (1979), and 
the profile predictive likeliliood Zip, first con- 
sidered by Mathiasen (1979). Let R = r(F, Z) 
denote a minimal sufficient statistic for (F, Z), 
Then 

LMy) = fe{,.-^)lfe[T[yyz)) (7) 
Lp{z\y) = max/0(y,2) = /^;(y,2) (8) 

Typically, Lc and Lp are quite similar when suf- 
ficiency provides a genuine reduction and the 
dimension of 9 is small. 

In linear normal models, Lp will ignore the 
number of parameters and can be misleadingly 
precise. A modification of Dp, Lmp, that ad- 
justs for this was suggested by Butler (1986, 
rejoinder), see also Bj0nistad (1990). Let Y - 
[Xiy.^.yXrx) and Z = (X|,...,X^), and assiune 
that all Xi% and X/s are mdependent. Let 
e = (^1,...,^^). Then Lmp is given by 

= M^iy) • im)r/vii^zi^ir/^ 

(9) 

Here, P{9) = {Ilj[9)} is the "observed" 
information-matrix based on (j/,2), i.e. Ii^{9) = 
'dHogfe{yyz)/d9id9y H, = //,(^zj, and 
Hz{9) is the kx{n + Tn) matrix of second-order 
pzirtial derivatives of lof feiViZ) with respccl 
to 9 and {y,z). We shall assume that any L 
considered is normalized as a probability distri 
bution in Z. The mean and variance of L are 
Liien called the predictive expectation and the 
predictive variance of Z, denoted by Ep{Z) and 
Vp{Z). Ep{Z) is the. a natural predictor for 2, 
called the mean predictor. L{z\y) also gives us 
an idea on how likely different 2- values are in 
light of til ; data, and can be used to construct 
prediction intervals for 2. An interval {ay^hy) 



3 



is a (1 - a) predictive interval based on L{z\y) 
if L{z\y)dz = 1 - a. A simplified (quasi) 
(1 - a) predictive interval is of the form 

Ep{Z)±uy/v^) (10) 

where u is the upper (a/2)-point in the ac- 
tual (exact or approximate) conditional distri- 
bution, given y, of [Z - Ee{Z\y))/y/Ve{Z\y). 

3. PREDICTOR AND PREDICTION INTER- 
VALS IN TWO-STAGE SAMPLING BASED 
ON PREDICTIVE LIKELIHOOD 

111 two-stage sampling, Z is given by (3), and 
is a sum of two mixtures. Therefore, instead 
of considering a predictive likelihood for Z di- 
rectly, we look at a joint predictive likelihood 
for Z and M{s) = (M;, i i s). It has the 
following form 

Zi(2,m(5)|y) = I„.(,)(^|y)i(m(5)|y) (11) 

Lrn{i)[Ay) ^ precuctive likelihood for z con- 
ditional on M[t) = 771(5), i.e. based on 
/^(y,2|m(5)). h{m[i)\y) is a predictive like- 
lihood for 771(5) based on /0(y,Tn(5)). Then 
£?p,V^p follow the usual rules for double expec- 
tation, i.e. 

E^[Z) = E^{E^{Z\M[}))) (12) 
Vi(2) = E^{Y^{Z\M{t))S 
^Y^{E^[Z\M[tm 

Iti (12) E^[Z\m[})) and Vp(Z|m(J)) are the 
predictive mean and variance for Z from 
Lrn[i){Ay)' principle we can derive L{z\y) 
as the marginal likelihood from L{z^m[i)\y\ 
The advantage of (11) is that we are able o 
obtahi E^[Z) and V),(Z) without actually de- 
riving L{z\x), 

Under the model (2) we 
can factorize /^(y, 2,^1(5)) = f^^^{m{s),m{s)) 
' //i,r,p(y(5)>2|m(5), 771(5)) and it is readily seen 
that applying Ip, given by (8), to the terms 
on the riglit hand side in (11) in fact gives us 
Lp{z,m{3)\y) = msx^ fe{y,z,m{3)), i.e. 

Lp{Zym{S)\y) = 2i„»(,)^(2|y)/ip(m(5)|y) . (13) 



It follows that Ep{Z) and Vp{Z) based on 
Lp{z,m{3)\y) can be derived by (12). We note 
that Ic, given by (7), has the same property, 
i.e. Lc{z,m{3)\y) = Lm[i),c{^\y)Lc{m{3)\y) 

Normal model 

It is now assumed that model (2) holds and 
that YijjMi are normally distributed. 

We shall first consider the second likelihood 
m (11), i (rn(5)|y), using Lp. Let d^^(S) denote 
the Jfe-dimensional multivariate ^-distribution 
with u degrees of freedom (d.f.) and varieuice- 
covaiiance matrix E, i.e. t^\li) is the distri- 
bution of (UlW)>/u where U - iVfc(0,S) and 

Let X{3) be the vector (a;» : i ^ 5). 
Then Lp{m{3)\y) leads to a multivariate t- 
distribution, specifically Lp{m{3)\y) is such 
that [M{3) - 4^(5)]/^ - t\!^''"'\v), where 
the m.l.e. are 4i given by (5), and &^ = 

iZiesi^i ' P^i)yv{xiy V = {Vii) with 

Vii = v{xi) + x\lw, and = xiXj/wg for 
i 5^ J. It follows that Ep{M,) = $Xi, Vp{Mi) = 
-^^cr^{v{xi) + x^/wg) and the predictive co- 
vziriances aro CaVp{Mi,Mj) = -XiXj/ws 
for i 5^ J. This implies that 

a* 

and Lmp (for M \/v{xi)^ i ^ 5), lead to 
moments similar to (14) with no - 2 replaced 
by Tio - 5 and no - 4 respectively. 

Let us now consider the first term in (11), 
^m(5)(2iy) based on fe{y, 2]m(3)). For this like- 
lihood we will restrict attention to Lp^ i.e. de- 
riving im(5),p(2|y). The m.l.e. fi,f^,p can be 
expressed the following way: 

.3_ 1 fSSE ^ y^ n.(yi-A)' \ 
n[l-p f^,l-p + nip) 

an*^ p is found numericziily, maximizing 
-(n/2)logf2 - (n/2)i:ie,log(l -p+ rup) + 
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((.. - no)/2)log(l - p). Here, SSE = 
Etg. Eie.iCj'y - Vi?- When m = c, for aU 
i 6 s, then ft = y = Yli^eVil'^o^ = 
55/n, p = max (o, 1 - ^ • , where 55 = 

Consider first the case when p and t jire 
known. Then ft is given by (15) with p replac- 
ing p. In tliis czise Lyn[j),p{Ay) is sucli that Z is 
normally distributed with 



Ep[Z\m(3)) = XiK- - rii) X 

t6« 



(16) 



1-/7 



n,/7 



a-/7 + ni/7 l-/7 + ni/j / ^ 



i;(;?|m(5)) = 7(Z|y,m(5))+ (17) 

^ n.- ( E + E("^«" - "i) ^ 
1-/7 



1 - P + "iP 



When /7,r are miknown, Lm{i),p{z\y) will for 
large no be approximately such that Z is 
normally distributed with Ep{Z\7n{3)) and 
Vp{Z\m{s)) given by (16) and (17) with 
p,f^ replacing /7,r^. It now follows, from 
(13), (14), (16) and (17) that, approximately, 
Lp{z,m{^)\y) has Ej,{Z) = Eg{Z\y) and 



V,{Z)^V^{Z\y)+ X (18) 



u2 I r.2 



2 St'g^ ^{ 



+ M2) . 



Here, V^{Z\y) is given by (6) and 
ft(Jfe) = -^^2x 



no — 



no / V ty. 



+ TP'T <^ 

no - fc 




r 



The predictive likelihood 

l(P^^)(2,m(5)|y) = I„.(,)^(2|y)^c(m(5)|y) 

leads to the same Ep{Z) wliile V;,(Z) equals (18) 
with h{5) instead ofh{2). With 

l('''-'')(2,m(5)|y) = I„.(,),p(2|y) •2;„.p(m(5)|y) 

we get the same Ep{Z) and Vp{Z) equal to (18) 
with h{A). 

It can be shown that, conditional on y, {Z - 
^0(^|y))/\A^TO) is asymptotically A'(0,1) 
as JV — no — ♦ 00 provided that the Xj's are 
bounded as JV - no oo. Hence Z\y is ap- 
;^roximately normal for large JV - no, and the 
quasi (1 - a) predictive interval given by (10) 
becomes 



Es{Z\y)±u{a/2)y/^^^ 

where u(f ) is the upper a/2-point in JV(0, 1). 
This amounts to regarding N{Ep{Z),Vp{Z)) as 
a predictive distribution for Z. Vp{Z) equals 
(18) if the interval is based on Lp{z^7n{3)\y)y 
while has (18) with h{5) and has 
(18) with /i(4). Let us denote these prediction 
intervals by /p, Jpc and Ifnp- Clearly Ip C Imp C 

Ipc. 

For large no there is practically no difference 
between these intervals. However, for small no 
they do differ. To find out how the intervals 
perform for small no (and small N) a simulation 
study with no = 6 and JV = 10 was done to esti- 
mate t^ > coxifidence levels Cp = P{Z € Ip{Y)), 
Cpc = P{Z e Jpc{Y)) and Cmp = P{Z 6 
Imp{y))i all conditional on s. The approxima- 
tions to iim(f),p and to the distribution of Z 
given y are not valid for small no and small 
iV - no- Still, it is of interest to find out how 
the coverage properties of the different mtervals 
are in tliis case. In later paper a more com- 
prehensive simulation study wiU be undertaken, 
including also large no, iV - no cases. 

The simulation study considers the following 
two mam cases, with 5 = (1,2,3,4,5,6), 1-a = 
.95, v{x) - X and n,- = c, Vi €: s. (I) xi = X2 
= ajg = 50, X4 = xb = 30, = = ajtj = 100, 
xg = xio = 50; c = 3,10. (II) xi = X2 = x^ = 




5000, X4 ^xs = 3000, XQ = xr = xs = 10000, 
xg = xio = 5000; c = 10,4C0. 

Case (I) : Two values of fi are considered, fi = 
5,100. For /i > 100 the confidence levels 
seemed to be essentially equal to the confidence 
levels when /i = 100. With regard to (r,0 the 
levels seemed to depend essentially on the ratio 
P/a and we consider /J/o* = .75, 1, 1.5, 2, 3. 

(la): fi = 100, r = 1, 5 and p = .1, .5, .9 . The 
confidence )evels are approximately constaitt for 
all the various chosen values of 0, Based on 
simulation of 60,000 observations of we 
find Cp = .924, Cmp = .973, Cpc = .992. 



Table 1 . Confidence levels for case (I) and fi = 



r = l;l- 


- a — 


.95. 








13/a 


.75 


1 


1.5 


2 


3 


P 












.1 


.929 


.936 


.932 


.937 


.927 


Cp .5 


.930 


.921 


.913 


.899 


.891 


.9 


.920 


.923 


.899 


.895 


.889 


.1 


.971 


.973 


.962 


.961 


.948 


Cmp -5 


.967 


.959 


.943 


.927 


.905 


.9 


.961 


.952 


.930 


.919 


.904 


.1 


.990 


.991 


.984 


.980 


.967 


Cpc .5 


.989 


.981 


.971 


.958 


.930 


.9 


.986 


.978 


.958 


.948 


.922 



Table 2 . Confidence levels for case (11) and (t = 
/? = 1, 1 - a = .95. 



•r/ix 
P 


.01 


.05 


.20 


c 


10,400 


10,400 


10 


400 


.01 


.923 


.936 


.940 


.926 


Cp .10 


.923 


.929 


.920 


.874 


.50 


.929 


.896 


.870 


864 


.01 


.97^ 


.970 


.944 


.949 


Omp .10 


.973 


.961 


.923 


.889 


.50 


.971 


.922 


.872 


.866 


.01 


.994 


.989 


.951 


.975 


Cpc.lO 


.994 


.982 


.928 


.911 


.50 


.994 


.951 


.876 


.873 
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(lb): /I = 5, r = 1 . Table 1 is based on simula- 
tion of 5000 observations of (y, z) in each case. 

Case (II) : We consider /? = cr = 1. It seems 
that the confidence levels depend on only 
through the coefficient of variation, r/(i. Table 
2 is based on simulation of 5000 observations of 
(y,2) in each case. 

When no = 6 and N = 10 Ip is clearly too 
short generally. Ipc is t;/pically too wide, espe- 
cially when p is only moderately large. Overall, 
Imp seems to have confidence levels closest to 
.95 of the three mtervals. 
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